In biomedical research, expectation bias, also referred to as Rosenthal effect, is the distorting effect on the results of an experiment caused by the expectation that the investigator, or the patient, has about the results themselves. The classical way to reduce the influence of this bias on the interpretation of results is through the blinding, or masking, of treatments, both terms referring to keeping the patients, the investigators, or the assessors unaware of the assigned treatment. Talking of expectations, can we postulate that the expectation bias will be greater the higher the unmet medical need underlying the research is? And, for the above reason, is the expectation bias stronger in oncology studies? This postulate does not seem to apply to the patients, since expectation (acting in patients via the placebo effect) is rarely associated with tumour positive responses.1 But, what about the investigators? If one looks at the recommendations of main regulatory bodies, the answer to this question will be that the investigator expectation is indeed considered an important confound in the interpretation of outcomes, thus requiring a blinded independent review to prevent the bias. In fact, in the current Food and Drug Administration (FDA) guideline on clinical trial endpoints for the approval of cancer drugs and biologics, the adoption of an independent blinded review is recommended for all the endpoints based on tumour assessments, namely disease-free survival, event-free survival, overall response rate (ORR), complete response, time to progression and progression-free survival (PFS), although for disease-free survival, event-free survival, time to progression and PFS, a decision should be taken on a case-by-case basis. Likewise, in the most recent European Medicines Agency (EMA) guideline, still available in draft, it is recommended that “if the study has to be conducted open label, this has implications with respect to choice of study endpoints, independent review, conduct of sensitivity analyses and other measures to be undertaken to limit potential bias related to the open-label nature of the trial”. We could not say how much the recommendations of FDA and EMA guidelines were based on the analysis of clinical evidence, and how much on methodological reasoning. As we will see below, some evidence is available, but limited to PFS in phase-3 trials. For certain, it is now possible to gain a quantitative measure of the influence of investigator expectation over the results of oncology trials, since many papers have been published so far that report the assessment of 2 pivotal endpoints, PFS and ORR, carried out both at local level (local assessment, LA) and by a blinded independent central review (BICR), within the same trial. If the assessment is conducted at local level, the local investigator, that is, the oncologist, will be in contact with the radiologist, hence will be aware of the treatment assigned (even though they are not directly involved in the imaging assessment). However, they will be blinded to the outcomes of BICR assessment, and the masking of assigned treatment will be maintained. By looking at the putative differences in assessment between LAs and BICRs in an adequately sized sample of clinical trials, an estimation can be obtained of the weight of expectation bias in this setting, if any. In the last 2 years, we have carried out extensive research on the topic of investigator expectation bias in oncology trials. We collected and analysed all phase-2 and phase-3 trials recorded in clinicaltrials.gov and EudraCT databases and reporting the results of PFS and/or ORR assessments carried out by both LAs and BICR within the same trial. First, we focused on PFS in phase-3 trials, a topic that has been matter of debate for more than a decade. Initially, Dodd et al., based on the analysis of 7 phase-3 trials showing no difference between the assessments of PFS carried out by LAs and BICR, raised the issue of BICR as an unnecessary, expensive and time-consuming procedure, which should not be used on a regular basis in confirmative phase-3 trials.2 Soon after, a group of researchers from a consortium of pharmaceutical companies further expanded the initial observation by Dodd and colleagues, reporting the results of an analysis on 27 phase-3 clinical trials where a dual estimation of PFS by LAs and BICR was carried out.3, 4 These authors found a strong correlation (R = .947) between LA and BICR assessments of PFS and concluded that LA evaluation provides a reliable estimate of PFS, making the blinding of assessors unnecessary.3, 4 Based on this evidence, a series of recommendation papers were published, suggesting the possibility to limit or even to abandon the blinded independent evaluation of PFS.5-7 A practical effect of this effort by the scientific community can be found in the FDA guideline, where PFS is ‘not always recommended’, whereas EMA guideline was not influenced by these proposals. Within this framework, we carried out a first study looking at possible discrepancies between the assessment of PFS carried out by LAs and BICRs, respectively, in the setting of phase-3 trials.8 Usually, these trials involve a comparison between the experimental treatment and a control therapy, so that the results are expressed as the hazard ratio (HR) of PFS curves. On a sample of 28 randomized controlled trials, we calculated a discrepancy index, defined as the ratio of HRs assessed by LAs and BICRs, respectively. With a null hypothesis of no-difference between the 2 estimates, the expected discrepancy index is = 1; in this study, we obtained an average discrepancy index of 0.98 (95% confidence interval: 0.927–1.032), thereby confirming—although via a different methodological approach—the previous findings by Amit et al.4 It is noteworthy that the samples of the 2 studies were largely independent between each other, with only three trials in common. In conclusion, the evidence from ourselves and other groups is altogether consistent with the notion that there is no significant difference between the assessment of PFS carried out in blinded or in open-label conditions, indicating that the expectation of investigators does not influence the measure of PFS in the setting of phase-3 trials. We then moved to investigate putative differences in the assessment of ORR between LAs and BICR in the context of phase-2 trials.9 Overall response rate is often chosen as the primary endpoint in the early phase of clinical development, seeking for preliminary evidence of efficacy. In this second study, 20 trials including the assessment of ORR by both LAs and BICR were selected, but the total number of comparisons analysed was 33, as many trials had >1 comparison carried out in the same study. Since the protocols involved single groups of patients, HRs were not available, and the discrepancy index was calculated as the ratio between each LA-assessed ORR and the corresponding ORR assessed by BICR.9 Again, the null hypothesis of no-difference was associated to an expected discrepancy index = 1, with values >1 indicating a more optimistic evaluation by LAs and vice versa for values <1. At variance with the study on PFS, here we found an average discrepancy index of 1.175 (95% confidence interval: 1.083–1.264). In 18 of 33 comparisons, a dual assessment of ORR and PFS was carried out; in this subgroup, the average discrepancy index for PFS was 1.092 (95% confidence interval: 0.96–1.22), and no significant correlation was found between ORR and PFS.9 In conclusion, a difference of +17.5% in favour of unblinded investigators strongly suggested that, at least in the setting of uncontrolled phase-2 studies, the expectation bias can influence in a significant manner the results. The above conclusion raised the question of why investigators' expectation should influence the assessment of ORR but not that of PFS, both endpoints being obtained through the same methodological approach, that is, the evaluation of imaging and patients' categorization using validated criteria. To address this point, we set a third study aimed at analysing differences in the assessment of ORR and PFS that could account for the discrepancies between the results of the 2 previous studies. In this study we also completed the analysis of our database, looking at the putative discrepancies in assessment between LAs and BICRs in the settings of PFS in phase-2 trials and ORR in phase-3 controlled trials.10 We confirmed in phase-2 trials the previous observation about the lack of significant difference between the assessments of PFS carried out by LAs or BICR, which led us to conclude that the expectation of investigators does not influence the measure of PFS regardless of whether it is carried out within phase-2 or phase-3 trials. We also found that LAs tend to overestimate ORR compared with BICR in phase-3 trials as well. However, such overestimate was observed equally in experimental and control groups; therefore, expressing the results as HRs, the 2 errors tend to compensate each other, and the comparison between the assessments of LAs and BICR shows no significant difference.10 This last finding is highly relevant in the context of our working hypothesis since it stands against a possible role of expectation bias in influencing the assessment of ORR by local investigators. In fact, in this case we would expect the overestimate to be significantly higher in experimental groups compared with controls. Instead, the present results rather suggest that a methodology bias, inherent to the measurement of ORR, may be responsible for the discrepancy in assessment between ORR and PFS. In the same paper, we attempted to analyse this phenomenon, indicating as possible determinants: (i) a limited number of measurements for ORR, compared with repeated measures with PFS; (ii) the time-to-response, which is a variable for ORR, whereas PFS is always measured after a response is established; (iii) the type of treatment, with small molecules in general inducing faster responses than immunotherapies; (iv) moreover, some protocols may assess ORR at fixed times, while other may consider the best response to calculate ORR.10 In conclusion, after completing the analysis of the whole dataset of phase-2 and phase-3 trials reporting the assessments of PFS and ORR by both LAs and BICR, we may now attempt to answer to the question entitling this paper: (i) concerning PFS, it is now well established that the expectation bias of investigators does not influence the results of assessment, either in phase-2 or in phase-3 studies; (ii) likewise, no differences are observed between the assessments of ORR carried out by LAs and BICRs, provided that trials have a control group; (iii) a strong concern remains on the assessment of ORR in single-group nonrandomized phase-2 trials, since a significant difference in the assessment between LAs and BICR is recorded in this setting. This conclusion is of special interest, if one considers that an increasing number of oncology drugs nowadays is granted a conditional approval based on the evidence from nonrandomized phase-2 studies. Cinzia Dello Russo and Pierluigi Navarra declare no conflict of interest. Open Access Funding provided by Universita Cattolica del Sacro Cuore within the CRUI-CARE Agreement. PN and CDR conceived the paper. PN drafted the manuscript. CDR critically reviewed the manuscript. This work received no financial support.